Frontiers in Genetics
○ Frontiers Media SA
Preprints posted in the last 90 days, ranked by how well they match Frontiers in Genetics's content profile, based on 197 papers previously published here. The average preprint has a 0.33% match score for this journal, so anything above that is already an above-average fit.
Noor, F. A.; Hossain, M.; Sarker, S. K.; Arafath, K.; Ety, S. S.; Maisha, J. A.; Mahmud-Un-Nabi, M. A.; Bhuyan, G. S.; Sultana, N.; Hossain, A. K. M. E.; Khan, W. A. K.; Shekhor, H. U.; Qadri, F.; Mannoor, K.
Show abstract
Patients with HbE/{beta}-thalassemia inheriting the same {beta}-globin mutations display varied clinical manifestations, the mechanism of which is only partially known. The study aimed to decipher the heterogenous basis of HbE/{beta}-thalassemia patients in more details by focusing on both hematological and genetic modifiers influencing the disease severity, which included-(i) HbF and HbE levels using Hb electrophoresis, (ii) {beta}-thalassemia mutations, (iii) anti3.7triplication using Gap-PCR, (iv) individual and cumulative effects of HbF-inducing SNPs in 4 major modifier genes, namely HBG2, BCL11A, HBSB1L_MYB intergenic-region, and HBBP1 which were genotyped using DNA sequencing and Real-time PCR-HRM methods. Accordingly, 130 diagnosed Bangladeshi patients with HbE/{beta}-thalassemia were enrolled and categorized as mild, moderate, and severe as per Mahidol scoring system. c.79G>A+IVS1_5G>C was the most predominant (73.8% of total) mutation pair across all the 3 severity groups, indicating secondary modifiers might influence the severity. Our study found both HbF and HbE protective to HbE/{beta}-thalassemia, as both were inversely related to the severity score (HbF: p<0.0001/r=-0.55; HbE: p<0.0001/r=-0.56). Four SNPs-XmnI-G{gamma}, rs2071348 (HBBP1), rs489544 and rs28384513 (HBS1L_MYB) showed significant association with the elevated HbF levels (p=0.005, 0.0001, 0.0001, 0.004 respectively). The multivariate analysis showed that the risk genotypes with single or combination of 2, 3,and 4 SNPs showed gradually increased risk [Odd Ratio (95%CI)= 2.51, 5.47, 19.5, 39.0, respectively] of less severe phenotype, suggesting that these linked SNP variants had a cumulative effect on both HbF level and clinical severity score. However, low HbE level and copresence of anti3.7triplication were found to nullify the ameliorating effect of multiple SNPs.
Yakymenko, I.; Mompart, A.; Caceres, M.
Show abstract
Complex genomic regions harbor different structural arrangements that can mutate quite rapidly, which makes determining their functional effects very difficult. Characterization of inversions originated by homologous mechanisms is especially challenging due to the presence of inverted repeats at the breakpoints and the fact that most of them are recurrent. Imputation can infer missing genotypes, but it has been mainly limited to simple variants and little is known about how well it works for human inversions. Here, we tested five common imputation programs to impute a set of 52 inversions experimentally genotyped in multiple samples that lacked SNPs in perfect linkage disequilibrium. Using whole genome sequencing data and simulated microarrays with variable SNP density, we found that 40.4-75.5% of inversions could be accurately imputed in three human populations by at least one program, with results depending mainly on the number of SNPs available, the genotyped samples and the recurrence of inversions. Also, genotype probability filtering was a key factor for inversion imputation accuracy. In particular, Minimac4 and IMPUTE5 showed more accurately imputed inversions and less poorly imputed individuals with respect to the other methods. This work therefore contributes to optimize inversion imputation, making possible the study of their functional impact.
Kucherenko, V.; Doroschuk, N.; Sarygina, E.; Sagaydak, O.; Bogdanov, V.; Mityaeva, O.; Krupinova, J.; Woroncow, M.; Albert, E.; Volchkov, P.
Show abstract
HLA loci are highly polymorphic genome regions, with allele frequencies varying significantly across different populations. Population HLA frequency databases may contain biases and make cross-study comparison complicated due to varying data curation protocols, genotyping methodologies, resolution, and inconsistencies in the selection criteria for population samples. This study presents HLA allele frequencies of class I (HLA-A, -B, -C) and class II (HLA-DRB1, -DQB1, -DQA1) as well as their combined haplotypes obtained from over 18,000 whole genome sequencing samples of the Russian population. Cohort was stratified based on PCA and admixture components providing frequencies for 14 different ethnic groups. For 12 groups cohort size allowed us to reach average saturation of 96% of allele frequencies in groups. Moreover, we demonstrated the utility of composed statistics for disease populational study using type 1 diabetes (T1D) as an example. Populations with similar aggregated genetic risk for T1D demonstrated substantial differences in frequencies of risk and protective HLA alleles. Obtained frequency data was made publicly available through the Allele Frequency Net Database improving previously sparse coverage in HLA frequencies data for east Europe and north Asia regions.
Rautila, O. S.; Atula, S.; Mustonen, T.; Schmidt, E.-K.; Valori, M.; Colombo, R.; Kere, J.; Kaivola, K.; Tienari, P. J.
Show abstract
Finnish gelsolin amyloidosis (AGel amyloidosis) is an autosomal dominant systemic amyloidosis caused by GSN c.640G>A p.D187N (rs121909715) founder variant. The disease was first described in 1969, and it was hypothesized that the Finnish patients share a common ancestor dating back to the 14th century. The link between two Finnish regions with high AGel incidence (Kanta-Hame and Kymenlaakso) has been hypothesized to have occurred in 1365 by a settler moving from Kanta-Hame to Kymenlaakso. Here, we used haplotype sharing tree (HST) to analyze Finnish AGel amyloidosis haplotypes to trace the geographic origin of the variant. We also estimated the time from the most recent common ancestor (MRCA) using single nucleotide polymorphism and short tandem repeat data. The HST -based analyses leveraging AGel amyloidosis cohorts from different Finnish geographic regions indicated, that the variant more likely appeared first in Kymenlaakso, not Kanta-Hame, contrary to the original hypothesis. The MRCA estimates for Finnish AGel ranged from 15 to 40 generations using four different methods, the mean of all estimates (27 generations) dated back to the 14th century. Thus, the data supports the original hypothesis on the variants spreading temporally, but not geographically. These results illustrate the use of HSTs in the analysis of haplotype structures and in tracing the ancestry of a founder variant.
Curtis, D.
Show abstract
UK Biobank has released whole genome sequence data for 500,000 participants, including allele counts for hundreds of millions of variants and these were considered in the context of the pentanucleotide background on which they occurred. Frequencies of singleton variants were obtained and compared with frequencies of more common variants. Results were highly correlated across chromosomes, reflecting systematic effects. C>T singleton variants were less frequent in the CG context but the opposite was true for more common variants, suggesting that they are relatively well tolerated and not subject to strong negative selection. The frequencies of singleton variant types were strongly influenced by their trinucleotide context and the total counts of variants in their trinucleotide context could be well approximated by combining five mutational signatures obtained from genomes of cancer cells. For some variant types, there were marked asymmetries in counts between plus and minus DNA strands. The patterns of these asymmetries for singleton variants differed between chromosomes, with five being negatively correlated with the rest. These asymmetries did not appear related to strand-specific gene content. It was noted that there were also strand asymmetries for some pentanucleotide sequences in the reference genome and that these were consistent across chromosomes. The sequence TTCGT is seen 673300 times on the plus strand but only 465807 times on the minus strand. These findings must reflect strand-specific mechanisms affecting mutation and selection which are not currently well understood and which could be investigated further. This research has been conducted using the UK Biobank Resource.
Rodriguez-Vazquez, R.; Karami, A. M.; Robledo, D.; Buchmann, K.
Show abstract
Rainbow trout is affected by a broad range of pathogens causing large economic losses and animal welfare concerns. Marker-assisted selection can significantly enhance resistance to pathogens in a few generations, and to this end many studies have focused on identifying quantitative trait loci (QTLs) for resistance traits. The integration of accumulated genetic resources provides an opportunity to uncover important genetic variation and candidate genes crucially involved in rainbow trout immunity. Here, we present a comprehensive meta-QTL (MQTL) analysis based on the integration of 145 QTLs related to pathogen resistance. These QTLs were refined into 26 MQTLs, of which 15 were validated by genome-wide association studies (GWAS). The average confidence interval (CI) of these MQTLs was reduced by 2.03-fold compared to the initial QTL, improving mapping precision. Integration of GWAS results revealed regions along the rainbow trout genome pivotal for pathogen resistance, and a major region in chromosome 3, which could be used in marker-assisted selection. Further, among the validated MQTLs we identified a subset of high-confidence MQTLs, based on those supported by at least three initial QTL from more than two independent studies, with a percentage of variance explained greater than 8% and a LOD score higher than three. Gene annotation identified 11 unique candidate genes within these high-confidence MQTLs involved in immune pathways, encoding proteins involved in the regulation of immune responses, signalling pathways, receptor activity, and direct immune effector production. The MQTLs and candidate genes identified are valuable resources for advancing molecular breeding and unravelling the genetic basis of pathogen resistance in rainbow trout.
Cooper, H. B.; Rojas Lopez, K. E.; Schiavinato, D.; Black, M. A.; Gardner, P. P.
Show abstract
Proteins and non-coding RNAs are functional products of the genome that are central for crucial cellular processes. With recent technological advances, researchers can sequence genomes in the thousands and probe numerous genomic activities of many species and conditions. Such studies have identified thousands of potential proteins, RNAs and associated activities. However there are conflicting interpretations of the results and therefore which regions of the genome are "functional". Here we investigate the relative strengths of associations between coding and non-coding gene functionality and genomic features, by comparing reliably annotated functional genes to non-genic regions of the genome. We find that the strongest and most consistent association between functional genes and genomic features are transcriptional activity and evolutionary conservation. We also evaluated sequence-based statistics, genomic repeats, epigenetic and population variation data. Other features strongly associated with function include histone marks, chromatin accessibility, genomic copy-number, and sequence alignment statistics such as coding potential and covariation. We also identify potential issues with SNP annotations in short non-coding RNAs, as some highly conserved ncRNAs have significantly higher than expected SNP densities. Our results demonstrate the importance of evolutionary conservation and transcription activity for indicating protein-coding and non-coding gene function. Both should be taken into consideration when differentiating between functional sequences and biological or experimental noise.
Blois, L.; Heuclin, B.; Bernard, A.; Denis, M.; Dirlewanger, E.; Foulongne-Oriol, M.; Marullo, P.; Peltier, E.; Quero-Garcia, J.; Marguerit, E.; Gion, J.-M.
Show abstract
Deciphering the genetic architecture of complex quantitative phenotypes remains challenging in quantitative genetics. These traits not only depend of multiple genetic factors but are also established over time and environments. Although quantitative genetics has investigated the genetic determinism of phenotypic plasticity in contrasted environmental conditions, the time related phenotypic plasticity has received less attention. Here we proposed a multivariate Bayesian framework, the Bayesian Varying Coefficient Model, designed for analysing the genetic architecture of the time related phenotypic plasticity by a multilocus approach. We applied the BVCM to time series phenotypes measured at various time scales (daily, monthly, yearly) across a diverse set of biological species. We included in this study: yeast (Saccharomyces cerevisiae), fungi (Fusarium graminearum), eucalyptus (Eucalyptus urophylla x E. grandis), and sweet cherry tree (Prunus avium). The BVCM results were compared with those obtained with a known genome-wide association method carried out time by time. For all species and traits, the BVCM was able to detect the major QTL identified by marker-trait association methods and revealed additional genetic regions of weak effect. It also increased the phenotypic variance explained for most of the phenotypes considered. It revealed dynamic QTLs with transitory, increasing or decreasing effects over time. By considering both the temporal and genetic multivariate structures in a single statistical model, we increased our understanding of the genetic architecture of complex traits notably by reducing the issue of missing heritability. More broadly, this work raises the foundation for extended applications in functional genomics, evolutionary ecology, and crop breeding programs, in which time-related phenotypic plasticity remains crucial for predicting and selecting key quantitative complex traits. Key messageBy capturing the genetic factors influencing the time related phenotypic plasticity, our approach contributes to a deeper understanding of the dynamic nature of genotype-phenotype relationships.
Belyakin, S. N.; Maksimov, D. A.; Pobedintseva, M. A.; Laktionov, P. P.; Mikhnevich, N. V.; Sipin, F. A.; Krylova, M. I.
Show abstract
Alleles of ASIP gene (Agouti locus) in dogs determine a wide spectrum of coat colors, from red to black. Gain-of-function Ay allele is the most dominant in the range of known ASIP mutations: when all other genes affecting coat pigmentation are intact, presence of Ay allele results in red coat color. Loss-of-function a allele is the most recessive allele of this gene. When homozygous, it gives black coat color. Usually, dogs with Ay/a genotype have red coat, because a single copy of Ay allele is sufficient to fully compensate for the non-functional allele a, implying the complete dominance in this pair of alleles. However exceptions are known. In the Hungarian Puli breed there is a specific coat pigmentation type called fako. We investigated the genetic composition of fako dogs and found evidence that the dominance of the Ay allele over the a allele may be incomplete in these dogs. Analysis of the MC1R gene that interacts with ASIP in the hair pigmentation genetic cascade allowed us to find the variants that may be responsible for the incomplete dominance of Ay allele over a allele in Hungarian Puli dogs.
Orkild, M. R.; Dybdahl, K. L.; Duun Rohde, P. D.
Show abstract
Inflammatory bowel disease (IBD) frequently co-occurs with immune-mediated and metabolic disorders, but whether these associations reflect shared genetics or causal effects remains unclear. We performed two-sample Mendelian randomization (MR) using large-scale genome-wide association study (GWAS) summary statistics to investigate potential causal effects of immune-mediated diseases and lifestyle traits on IBD, Crohns disease (CD), and ulcerative colitis (UC). SNP-based heritability and genetic correlations were estimated to contextualize findings. Following false discovery rate correction, genetically predicted psoriasis was positively associated with IBD (OR 1.15), CD (OR 1.23), and UC (OR 1.10), with the strongest effect observed for CD. Genetically predicted type 2 diabetes mellitus (T2DM) showed a modest inverse association with UC (OR 0.88). No lifestyle-related traits remained significant after correction. Sensitivity analyses indicated heterogeneity across instruments and evidence of directional pleiotropy in selected models, whereas no pleiotropy was detected for the T2DM-UC association. These findings support a role of psoriasis-related immune pathways in IBD susceptibility and suggest a potential inverse association between genetic liability to T2DM and UC.
Ahmad, A.; mustafa, h.; Khan, W. A.; Manan, A.; Anwer, I.; Akram, W.
Show abstract
Linkage disequilibrium (LD) and haplotype block structure govern the resolution and utility of genomic selection, marker-assisted selection, and genome-wide association studies (GWAS) in livestock. We performed a comprehensive genome-wide characterization of LD decay, haplotype block architecture, and population diversity across all 24 autosomes in Nili-Ravi buffalo (Bubalus bubalis; n = 85), using 43,543 post-quality-control SNPs. Mean genome-wide r2 was 0.124 (median 0.074) and mean D was 0.540 (median 0.481), with LD half-decay at {approx}70 kb. A total of 133 haplotype blocks encompassing 721 SNPs were identified (Gabriel et al., 2002). Haploview analysis of nine chromosomes harbouring bTB resistance candidate genes revealed contrasting selection signatures: directional selection at innate immune loci (IFNG, TLR1; H < 0.55) versus balancing selection at adaptive immune loci (BoLA-DRB3, SP110; H > 1.0). Critically, BBU15 Block 3 (28.6 kb; OR52E5/NCR1 locus, 47.16 Mb) showed a genome-wide significant integrated haplotype score (iHS; -log1 0 p = 5.408), directly co-localising with the published bTB susceptibility QTL (Bermingham et al., 2014). The TAA haplotype (frequency 53.3%) at this block represents a candidate resistance-associated haplotype for marker-assisted selection. These findings provide essential parameters for SNP panel design and bTB resistance breeding in South Asian buffalo.
Lamon, S.; Bourke, P. M.; Abernathy, B. L.; dos Santos, J. F.; de Godoy, I. J.; Leal-Bertioli, S. C. M.; Bertioli, D. J.
Show abstract
Polyploidization in peanut (Arachis hypogaea L.) provided evolutionary advantages by increasing heterosis, the response to selection, and enhancing adaptability. However, it also caused a genetic bottleneck by isolating cultivated peanut from its wild diploid relatives. Mechanisms such as homoeologous exchange can partially restore genetic diversity by generating new allelic combinations. Double reduction is a rare segregation pattern restricted to polyploids, in which a single-dosage locus yields duplex gametes. It requires multivalent formation and crossing over between non-sister chromatids, both of which are associated with homoeologous exchange. Although peanut mainly exhibits disomic pairing, occasional multivalents theoretically allow low-frequency double reduction. To estimate double reduction and examine its relationship with genetic instability, a high-density phased linkage map was constructed using a backcross population from a cross between a neoallotetraploid [A. magna K 30097 x A. stenosperma V 15076]4x (MagSten) and cultivated peanut. The final map included 9,717 SNP markers with an average spacing of 0.22 centiMorgans. Some progenies showed unbalanced genomic compositions, creating artifacts in linkage analysis. Removing these progenies improved the map and suggested a common origin for artifacts previously observed in other linkage maps, revealing a novel aspect of mapping in allotetraploid peanut. Analysis of the phased map revealed double reduction in 12% of progenies. Notably, one event produced a genomic composition consistent with theoretical predictions, supporting the expectation that double reduction causes unbalanced genomes in allopolyploids. These results indicate that double reduction is a low but significant frequency genetic phenomenon in the segmental allotetraploid peanut, contributing to the genetic instability and evolutionary dynamics of this and likely other allopolyploid genomes. Article SummaryThis study investigated double reduction, a rare genetic event in segmental allopolyploid peanut, which can create unbalanced genomic compositions and affect genetic diversity. We generated a backcross population using neoallotetraploid and cultivated peanuts, then constructed a high-density phased linkage map. Analysis revealed unbalanced genomic compositions in some progenies caused by homoeologous exchanges, which reduced map quality. Double reduction was estimated to occur in approximately 12% of progenies, aligning with theoretical expectations for genomic imbalance. These results demonstrate that double reduction contributes to genetic instability, inheritance patterns, and genome evolution in allopolyploid organisms such as peanut.
Shen, J.; Tang, S.; Xia, Y.; Qin, J.; Xu, H.; Tan, Z.
Show abstract
BackgroundConventional models of human ribosomal DNA (rDNA) array organization have historically depended on transcription-centric boundaries, partitioning the unit into a [~]13 kb rDNA transcription region and a monolithic [~]31 kb intergenic spacer (IGS). While our previous identification of Duplication Segment Units (DSUs) mapped these arrays based on an intuitive analysis of the microsatellite density landscape of the complete reference human genome, our present deep mining of this landscape has revealed a more accurate rDNA Gene Unit Pattern. Methods & ResultsIn this study, we conducted a deep mining analysis of our previously established microsatellite density landscape of the T2T-CHM13 assembly, focusing specifically on nucleolar organizing regions (NORs). We suggest a more accurate rDNA Gene Unit Pattern containing a (CTTT)n microsatellite aggregation ahead of the rDNA gene and a (CT)n microsatellite aggregation behind the gene, rather than a pattern featuring an IGS region inserted between two rDNA genes. ConclusionsA correct rDNA gene pattern of the human genome probably includes a (CTTT)n microsatellite aggregation ahead of the gene and a (CT)n microsatellite aggregation behind it, which possibly constitute cis- and trans-regulating regions; the (CTTT)n and (CT)n microsatellite aggregations may provide two different local stable DNA structures for regulatory protein binding.
Gordillo-Gonzalez, F.; Galiana-Rosello, C.; Grillo-Risco, R.; Soler-Saez, I.; Hidalgo, M. R.; Siomi, H.; Kobayashi-Ishihara, M.; Garcia-Garcia, F.
Show abstract
We present a novel integrative analysis of transposable elements (TEs) in 4 single cell RNA-seq (scRNA-seq) datasets of postmortem substantia nigra pars compacta samples of Parkinson Disease (PD) patients matched healthy controls, with the objective of building a cell-type specific trustworthy atlas of TEs that may clarify the role of TEs in sex differences in PD. We have used the soloTE tool to evaluate the TEs expression changes across all snRNA-seq studies identified in our previous systematic review, and then integrated the results using meta-analysis techniques. Finally, we evaluated the possible associations between TEs and protein coding genes by integrating our previous results in this matter with the information of TEs obtained, in order to propose the possible action mechanism by which some of the TEs contribute to PD.
Rodriguez-Vazquez, R.; Mukiibi, R.; Ferraresso, S.; Franch, R.; Peruzza, L.; Rovere, G. D.; Radojicic, J.; Babbucci, M.; Bertotto, D.; Toffan, A.; Pascoli, F.; Penaloza, C.; Houston, R. D.; Tsigenopoulos, C. S.; Bargelloni, L.; Robledo, D.
Show abstract
MicroRNAs (miRNAs) are key post-transcriptional regulators of antiviral immunity, controlling gene expression by targeting 3 UTRs of immune-related transcripts. Despite their importance, the role of miRNAs in viral nervous necrosis (VNN) resistance in European seabass (Dicentrarchus labrax) is unexplored. Here, we characterized for the first time the brain miRNome of seabass from three VNN-resistance genotypes (susceptible, intermediate, resistant) across two genetically distinct seabass clusters. Differential expression analyses revealed cluster-specific patterns, with susceptible fish consistently showing overexpression of the differently expressed miRNAs (DEmiRNAs) as compared to the resistant fish. Considering the two genetic clusters in the study, miR-199-5p was differentially expressed between the VNN susceptible and resistant fish. This miRNA was found to be less expressed in the resistant individuals. Functional characterization of the miRNA predicted that it binds to two distinct miRNA recognition elements (MREs) within the ifi27l2a 3 UTR. These MREs flank a SNP (Chr3:10,082,380) previously associated with VNN survival. A strong negative correlation (r= -0.840) between miR-199-5p expression and ifi27l2a mRNA abundance further supports a post-transcriptional repression mechanism. Together, these results propose a regulatory model in which miR-199-5p modulates ifi27l2a expression, contributing to phenotypic variation in VNN resistance and positioning it as a promising biomarker for seabass aquaculture breeding.
Alvarez Jerez, P.; Rhie, A.; Kim, J.; Hebbar, P.; Nag, S.; Antipov, D.; Koren, S.; Lara, E.; Beilina, A.; Hansen, N. F.; Arber, C. F.; Zulueta, J.; Wild-Crea, P.; Patel, D.; Hickey, G.; Waltz, B.; Malik, L.; Skarnes, W. C.; Reed, X.; Genner, R.; Daida, K.; Pantazis, C. B.; Grenn, F.; Nalls, M. A.; Billingsley, K.; Fossati, V.; Wray, S.; Ward, M.; Ryten, M.; Cookson, M. R.; Jain, M.; Paten, B.; Phillippy, A. M.; Blauwendraat, C.
Show abstract
While induced pluripotent stem cells (iPSCs) have gained popularity in studying neurodegenerative diseases, the heterogeneity of stem cells used across studies impacts cross-study comparison. The iPSC Neurodegenerative Disease Initiative (iNDI) selected the KOLF2.1J cell line and prioritized its use as a reference standard for studying the effects of pathogenic variants on cell biology due to its stability and neutral neurodegenerative disease genetic risk. This cell line, and its derivatives expressing over 100 variants related to Alzheimers disease, Parkinsons disease, and other neurological diseases, are available for academic and industry access. Current genomic data analyses are limited by the use of a human reference genome that does not capture the complete genetic background of a given iPSC line. While in the future this issue may be partially mitigated by the creation of a comprehensive human pangenome, previous work has shown that generating custom genomes is of value both to characterize the variation present and to serve as a more appropriate genomic reference. Here, we generated and characterized a custom complete genome assembly from KOLF2.1J. Mapping of sequencing reads to a personalized diploid assembly results in more comprehensive mapping compared to traditional linear references (i.e GRCh38). In addition, we provide a comprehensive custom gene annotation along with isoform expression and differential methylation analyses across multiple cell types. The assembly and all additional data is browsable and publicly available. This resource will enable more accurate investigation of the KOLF2.1J cell line and any genomics data generated compared to using traditional generalized references, while also serving as a foundational approach for establishing custom reference assemblies for other high-value iPSC lines.
Thiyagarajan, K.; Pierre, C. S.; Kumar, C.; Sanyal, D.; Thakur, G.; Singh, D.; Thakur, D.; Tomar, A.; Vikram, P.; Valluru, R.
Show abstract
Phosphorus Starvation Tolerance 1 in rice (OsPSTOL1, known as Phosphorus uptake 1, Pup1) is a receptor-like cytoplasmic protein kinase that confers tolerance to phosphorus deficiency. The OsPSTOL1 gene possesses a Ser/Thr kinase and shows high amino-acid sequence similarity with the leaf rust receptor-like kinase (OsLrK10). We hypothesize that the putative wheat TaPSTOL1 and TaLrK10 have a common ancestral origin and that putative TaPSTOL1 diverged recently acquiring new structural modifications and biological functions in the process. In this study, we identified all putative TaPSTOL1 homeologs and examine the evolutionary relationship between TaPSTOL1 and TaLrK10 in Triticum species. Our results indicate that the putative TaPSTOL1 diverged recently without possessing the amino-terminal domain, which is a typical characteristic of TaLrK10. We observed numerous conversions tracts between these two genes and the substitution pattern of randomly selected amino acids indicates that dynamic selection pressures acted on both genes. The putative TaPSTOL1 shows high nucleotide diversity compared to TaLrK10 within Triticum species. Further, a multiple-sequence analysis reveals that the third exon of TaLrK10 appears to have been duplicated and diverged as a putative single-exon based TaPSTOL1 in bread wheat. Overall, our comparative analysis indicates that both TaPSTOL1 and TaLrK10 appears to have diverged from a common ancestor, acquiring distinct structural organizations and biological functions.
Izquierdo, P.; Weng, X.; Juenger, T.; Bonnette, J. E.; Yoshinaga, Y.; Daum, C.; Lipzen, A.; Barry, K.; Blow, M. J.; Lehti-Shiu, M. D.; Lowry, D.; Shiu, S.-H.
Show abstract
Uncovering the genetic architecture of quantitative traits is challenging because polygenic control yields small individual gene effects and because gene-gene and genotype-by-environment interactions add further complexity. To understand the genetic basis of polygenic traits and their plasticity across environments, we integrated genome-wide SNPs and RNA-seq transcript data with interpretable statistical and machine learning models in a switchgrass (Panicum virgatum) diversity panel grown at contrasting field sites in Michigan and Texas. Notably, in addition to single environments, our trait prediction models were able to predict phenotypic differences, across environments i.e., plasticity. By interpreting trait prediction models with explainable artificial intelligence methods, we identified important features--genes that are the most predictive of flowering time and annual biomass production across environments, based on their associated gene expression levels and nearby SNPs. This approach recovered canonical flowering regulators and revealed novel, environment-specific candidate flowering genes. Further, transcriptome models consistently recovered more switchgrass genes homologous to experimentally validated genes in Arabidopsis and rice than SNP-based models. Feature interaction scores from the models also allow the identification of trait- and environment-dependent gene-gene interactions, where flowering time showed stronger and more abundant interactions than biomass. While some of the interactions identified are consistent with the link between flowering time and yield, most are novel predictors that need to be further evaluated. Together, these results demonstrate that interpretable genomic prediction with explainable artificial intelligence approaches can convert trait prediction models into mechanistic hypotheses about putative causal genes and interactions controlling traits within and across environments. These results will help to prioritize target genes for validation and inform germplasm selection for cultivar improvement.
Seerley, A. L.; Rothfuss, M. T.; Gray, B. M.; Sebogo, M. A.; Manakelew, B. A.; Pounder, J. I.; Bowler, B. E.; Leavens, M. J.; Grindeland Panter, A. L.
Show abstract
Chronic Wasting Disease (CWD) is a transmissible spongiform encephalopathy (TSE) of cervids (elk, deer, moose, and reindeer) that is increasing in prevalence and expanding to new geographical areas. TSEs, commonly referred to as prion diseases, are fatal neurodegenerative diseases that occur in a variety of mammals, including humans, and typically exhibit species-specific characteristics. This study reports the sequencing of the prion protein gene (PRNP) in retropharyngeal lymph node samples from 358 Montana mule deer (Odocoileus hemionus) and the identification of 36 PRNP genetic variants, many of which have not been reported previously. Further investigations tracked spatiotemporal characteristics of variants to hunting districts, year of harvest, and CWD status. PRNP polymorphisms V12F, D20G, R40Q, and S225F were examined with EmCAST computational predictions to determine the relationship between sequence and structural variations providing further insights into mechanisms affecting CWD outcomes. EmCAST predictions suggest the novel variant V12F phenotype is attributable to functional changes such as altered protein-protein interactions that might be linked to the CWD positive status of the samples. Notably, the analysis of S225F by EmCAST predicted that S225F is a neutral mutation for folded PrP and incompatible with fibril PrP, suggesting a potential structural mechanism for why this previously known variant may provide protection against CWD based on reduced fibril PrP formation. The CWD-positive samples harboring PRNP variants were examined with the prion RT-QuIC assay, including the novel variant V12F, which resulted in prion seeding activity. Author SummaryChronic Wasting Disease (CWD) is a fatal disease of cervids, which include deer, elk, and moose. Since its discovery in 1967, CWD has spread to 36 U.S. states and four Canadian provinces, with prevalences exceeding 20% in select free-ranging populations. With the popularity of hunting big game animals and the role of these species in the ecosystem, concerns have arisen regarding the transmission of disease to humans, as well as how to mitigate long term consequences of disease on animal populations. Given the significant risk of species spillover and the limitations of current management, innovative genetic research is essential. Our study identified novel PRNP genetic variants in Montana mule deer, cataloging their regional distribution and CWD status across several hunting seasons. By investigating the impact of these polymorphisms on protein stability and seeding activity, we provide critical insights into the genetic factors that influence disease phenotypes and transmissibility in wild cervid populations.
Skvortsova, L.; Yergali, K.; Zhaxylykova, A.; Begmanova, M.; Mansharipova, A.
Show abstract
Genome-wide association studies (GWAS) of ischemic heart disease (IHD) remain underrepresented in Central Asian populations. We conducted a pilot GWAS of IHD with co-occurring arterial hypertension in a Kazakh cohort to identify candidate loci for future replication. A case-control GWAS was performed in 451 individuals (236 cases and 215 controls). Genotyping was conducted using the Illumina Infinium Global Screening Array-24 v3.0. Association testing was performed using a logistic regression under an additive genetic model adjusted for age, sex and the first ten principal components (PC1 - PC10). Multiple testing correction was applied using the Bonferroni adjustment. As an additional analysis, knowledge-guided GWAS (KGWAS) followed by MAGMA gene-based testing was used to prioritize candidate genes. After quality control, 345 371 variants were tested. Two loci surpassed the Bonferroni-corrected genome-wide significance threshold: rs28898595 at the UGT1A locus (effect allele C; OR = 0.33, 95% CI = 0.23 - 0.49; p = 3.01x10-8) and rs28709059 in the intron region of the ACTR3C gene (effect allele C; OR = 0.4, 95% CI = 0.29 - 0.55; p = 4.08x10-8). Several additional loci showed suggestive evidence of association. In gene-level analysis, the CSMD1 gene demonstrated a significant association signal in MAGMA consistent with the European (p = 1.16x10-11) and East Asian (p = 9.07x10-11) LD reference panels. This pilot study identifies genome-wide significant loci (UGT1A, ACTR3C genes) and supports CSMD1 gene as a prioritized candidate gene for the complex phenotype of IHD associated with co-occurring arterial hypertension in the Kazakh cohort. These findings are preliminary and require replication in larger Central Asian cohorts and further functional validation.